CDF confidence bands - DKW inequality

2022-03-08 ยท 2 min read

    \[ \renewcommand{\Pr}[1]{\text{Pr}\left[ #1 \right]} \def\eps{\varepsilon} \]

    Overview #

    Let's say I have a well known distribution defined by a CDF $F(x)$. I also sample an empirical CDF $F_n(x)$ from a population. Given some confidence parameter, the DKW Inequality bounds how close a random ECDF $F_n(x)$ will be to the true CDF from which the samples are drawn.

    Inequality #

    The inequality itself states the probability that the Kolmogorov-Smirnov Statistic violates some bound (the K-S statistic is just $||F_n(x) - F(x)||_{\text{max}}$)

    \[ \Pr{\sup_{x} \; \Bigl|F_n(x) - F(x)\Bigr| > \eps} \le 2 e^{-2 n \eps^2} \qquad \text{for every } \eps > 0. \]

    Note: there is an extended inequality for a multivariate $F$. The bound changes from $2 e^{(\cdots)}$ to $2k \, e^{(\cdots)}$, where $k$ is the vector dimension.

    CDF Confidence Bands #

    We can use the DKW Inequality to produce a "band" around our true CDF that must contain our empirical CDF with some confidence probability.

    With a failure probability $\alpha$, our sampled ECDF will sit outside the interval

    \[ \Bigl[ F(x) - \eps,\; F(x) + \eps \Bigr] \qquad \text{where } \eps = \sqrt{\frac{\ln \frac{2}{\alpha}}{2n}} \]

    Pros #

    1. allows us to contain the entire CDF with a single band value.
    2. non-parametric - requires no assumptions about the sample distribution.

    Cons #

    1. provides weak guarantees at the tails, tighter bounds around the median
    2. pointwise approaches (what?) can provide tighter bounds at each individual value
    3. non-parametric - provides weaker bounds than others with more assumptions.

    Example Bounds #

    $n$$\alpha$$\eps$
    100.10.38702275602049496
    100.010.5146997846583985
    100.0010.6164779987778186
    1000.10.12238734153404082
    1000.010.16276236307187292
    1000.0010.19494746035204052
    10000.10.038702275602049495
    10000.010.05146997846583985
    10000.0010.06164779987778186
    100000.10.012238734153404082
    100000.010.016276236307187292
    100000.0010.019494746035204052
    200000.10.008654091913011426
    200000.010.011509037065006824
    200000.0010.013784867119002347
    1. DKW Inequality (wikiwand)
    2. ENS PSL Fact Sheet 1: DKW Inequality
    3. On the tight constant in the multivariate DKW Inequality
    4. Kolmogorov-Smirnov Statistic (wikiwand)